Search CORE

416 research outputs found

A network approach for managing and processing big cancer data in clouds

Author: D Hanahan
Dimitrios Tsoumakos
EM Zdobnov
L Wang
L Wang
L Wang
M Lawrence
Moustafa Ghanem
R Chen
RA Weinberg
Wei Jie
Wei Xing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2015
Field of study

Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data

Crossref

UWL Repository

An XML transfer schema for exchange of genomic and genetic mapping data: implementation as a web service in a Taverna workflow

Author: Andy Law
EM Zdobnov
F Achard
J Kennedy
K-H Cheung
LD Stein
RC Geer
S Philippi
Trevor Paterson
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genomic analysis, particularly for less well-characterized organisms, is greatly assisted by performing comparative analyses between different types of genome maps and across species boundaries. Various providers publish a plethora of on-line resources collating genome mapping data from a multitude of species. Datasources range in scale and scope from small bespoke resources for particular organisms, through larger web-resources containing data from multiple species, to large-scale bioinformatics resources providing access to data derived from genome projects for model and non-model organisms. The heterogeneity of information held in these resources reflects both the technologies used to generate the data and the target users of each resource. Currently there is no common information exchange standard or protocol to enable access and integration of these disparate resources. Consequently data integration and comparison must be performed in an <it>ad hoc </it>manner. Results We have developed a simple generic XML schema (GenomicMappingData.xsd – GMD) to allow export and exchange of mapping data in a common lightweight XML document format. This schema represents the various types of data objects commonly described across mapping datasources and provides a mechanism for recording relationships between data objects. The schema is sufficiently generic to allow representation of any map type (for example genetic linkage maps, radiation hybrid maps, sequence maps and physical maps). It also provides mechanisms for recording data provenance and for cross referencing external datasources (including for example ENSEMBL, PubMed and Genbank.). The schema is extensible via the inclusion of additional datatypes, which can be achieved by importing further schemas, e.g. a schema defining relationship types. We have built demonstration web services that export data from our ArkDB database according to the GMD schema, facilitating the integration of data retrieval into Taverna workflows. Conclusion The data exchange standard we present here provides a useful generic format for transfer and integration of genomic and genetic mapping data. The extensibility of our schema allows for inclusion of additional data and provides a mechanism for typing mapping objects via third party standards. Web services retrieving GMD-compliant mapping data demonstrate that use of this exchange standard provides a practical mechanism for achieving data integration, by facilitating syntactically and semantically-controlled access to the data.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Advancing Genomics with OrthoDB, BUSCO, and the LEM Framework

Author: Berkeley M.
Kriventseva EV
Kuznetsov D.
Manni M.
Seppey M.
Tegenfeldt F.
Zdobnov EM.
Publication venue: Belgrade : Institute of molecular genetics and genetic engineering
Publication date: 01/01/2023
Field of study

The rapid growth of genomics data necessitates continuous advancements in bioinformatics tools. This presentation highlights the latest updates to our toolbox, including OrthoDB v11, BUSCO v5, and the LEM benchmarking framework. OrthoDB (https://www.orthodb.org) is a leading resource for gene orthology and functional annotations across diverse eukaryotes, prokaryotes, and viruses. Orthology facilitates precise bridging of gene function knowledge within the genomics sphere. OrthoDB v11 encompasses over 100 million genes from 18,000 prokaryotes and nearly 2,000 eukaryotes, providing extensive species coverage. The open-source OrthoLoger software (https://orthologer.ezlab.org) allows mapping of novel gene sets to precomputed orthologs, linking them to relevant annotations. BUSCO (https://busco.ezlab.org) serves as a standard tool for assessing the completeness of genome assemblies, transcriptomes, and predicted gene sets, complementing assembly contiguity measures like N50 values. A spin-off of OrthoDB, BUSCO evaluates the presence and coverage of marker genes, offering an evolutionarily-grounded expectation of gene content completeness. BUSCO v5 now automatically selects the most suitable dataset for evaluation, outperforming the popular CheckM tool. Its efficiency is particularly evident in large eukaryotic genomes, and it is uniquely capable of assessing both eukaryotic and prokaryotic species, making it applicable to metagenome-assembled genomes of unknown origin. The LEMMI (https://lemmi.ezlab.org) benchmarking framework, now in version 2, facilitates informed software tool selection. This Live Evaluation of Methods (LEM) for Metagenome Investigation uses a container-based approach for continuous benchmarking and effective end-user distribution. The versatile framework can be extended to other procedures, such as gene orthology inference with LEMOrtho (https://lemortho.ezlab.org). The LEM benchmarking approach aims to become a community-driven effort, allowing developers to showcase novel methods and users to access standardized, easy-to-use software. We encourage researchers to apply this framework in their domain and welcome feedback.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

imagine

A comparison of common programming languages used in bioinformatics

Author: A Conesa
AB Clegg
D Butt
D Posada
EM Zdobnov
GPS Raghava
H Mangalam
L Prechelt
LJ McGuffin
Mathieu Fourment
Michael R Gillings
MK Kuhner
N Saitou
RA Irizarry
S Guindon
SF Altschul
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url> Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

annot8r: GO, EC and KEGG annotation of EST datasets

Author: A Bairoch
A Conesa
A Papanicolaou
DM Martin
E Camon
EM Zdobnov
J Bai
J Parkinson
J Parkinson
JD Wasmuth
JE Stajich
LB Koski
M Ashburner
M Kanehisa
Mark L Blaxter
MS Boguski
Ralf Schmid
SF Altschul
SR Stürzenbaum
The UniProt Consortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Leicester Research Archive

Quail Genomics: a knowledgebase for Northern bobwhite

Author: A Darling
AK Hudek
Arun Rawat
Arun Rawat
B Langmead
C Iseli
Edward J Perkins
EM Zdobnov
J Balthazart
JR Sauer
KA Gust
Kurt A Gust
M Ashburner
MA Harris
MJ Quinn
Mohamed O Elasri
MS Johnson
SC Potter
SH Nagaraj
TM Crowley
W Carre
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The Quail Genomics knowledgebase (<url>http://www.quailgenomics.info</url>) has been initiated to share and develop functional genomic data for Northern bobwhite (<it>Colinus virginianus</it>). This web-based platform has been designed to allow researchers to perform analysis and curate genomic information for this non-model species that has little supporting information in GenBank. Description A multi-tissue, normalized cDNA library generated for Northern bobwhite was sequenced using 454 Life Sciences next generation sequencing. The Quail Genomics knowledgebase represents the 478,142 raw ESTs generated from the sequencing effort in addition to assembled nucleotide and protein sequences including 21,980 unigenes annotated with meta-data. A normalized MySQL relational database was established to provide comprehensive search parameters where meta-data can be retrieved using functional and structural information annotation such as gene name, pathways and protein domain. Additionally, blast hit cutoff levels and microarray expression data are available for batch searches. A Gene Ontology (GO) browser from Amigo is locally hosted providing 8,825 unigenes that are putative orthologs to chicken genes. In an effort to address over abundance of Northern bobwhite unigenes (71,384) caused by non-overlapping contigs and singletons, we have built a pipeline that generates scaffolds/supercontigs by aligning partial sequence fragments against the indexed protein database of chicken to build longer sequences that can be visualized in a web browser. Conclusion Our effort provides a central repository for storage and a platform for functional interrogation of the Northern bobwhite sequences providing comprehensive GO annotations, meta-data and a scaffold building pipeline. The Quail Genomics knowledgebase will be integrated with Japanese quail (<it>Coturnix coturnix</it>) data in future builds and incorporate a broader platform for these avian species. </p

Aquila Digital Community

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A network approach for managing and processing big cancer data in clouds

Author: D Hanahan
Dimitrios Tsoumakos
EM Zdobnov
L Wang
L Wang
L Wang
M Lawrence
Moustafa Ghanem
R Chen
RA Weinberg
Wei Jie
Wei Xing
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Evidence for a novel coding sequence overlapping the 5'-terminal ~90 codons of the Gill-associated and Yellow head okavirus envelope glycoprotein gene

Author: AE Firth
AE Firth
AE Firth
AE Firth
AE Gorbalenya
Andrew E Firth
AO Pasternak
BYW Chung
C Rancurel
D Matsuda
EM Zdobnov
JA Cowley
JA Cowley
JA Cowley
JD Bendtsen
JD Thompson
John F Atkins
N Sittidilokratna
N Sittidilokratna
PK Wijegoonawardane
PK Wijegoonawardane
R Belshaw
S Jitrapakdee
SF Altschul
W Gangnonngiw
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

The genus Okavirus (order Nidovirales) includes a number of viruses that infect crustaceans, causing major losses in the shrimp industry. These viruses have a linear positive-sense ssRNA genome of ~26-27 kb, encoding a large replicase polyprotein that is expressed from the genomic RNA, and several additional proteins that are expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the envelope glycoprotein encoding sequence, ORF3, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF3. We propose that translation of the new ORF initiates at a conserved AUG codon separated by just 2 nt from the ORF3 AUG initiation codon, resulting in a novel 86 amino acid protein

Crossref

Springer - Publisher Connector

PubMed Central

Intron Dynamics in Ribosomal Protein Genes

Author: A Nakao
AG Russell
CJ Venter
D Brett
DC Jeffares
DH Nguyen
EM Zdobnov
ES Maxwell
FU Battistuzzi
H Philippe
HD Nguyen
Hung D. Nguyen
IB Rogozin
IG Wool
J Felsenstein
JD Thompson
JE Nixon
JS Mattick
KT Tycowski
M Csürös
M Yoshihama
M Yoshihama
Maki Yoshihama
N Kenmochi
Naoya Kenmochi
Oliver Hofmann
P Andolfatto
SB Hedges
SW Roy
SW Roy
VN Babenko
Publication venue: Public Library of Science
Publication date: 03/01/2007
Field of study

The role of spliceosomal introns in eukaryotic genomes remains obscure. A large scale analysis of intron presence/absence patterns in many gene families and species is a necessary step to clarify the role of these introns. In this analysis, we used a maximum likelihood method to reconstruct the evolution of 2,961 introns in a dataset of 76 ribosomal protein genes from 22 eukaryotes and validated the results by a maximum parsimony method. Our results show that the trends of intron gain and loss differed across species in a given kingdom but appeared to be consistent within subphyla. Most subphyla in the dataset diverged around 1 billion years ago, when the “Big Bang” radiation occurred. We speculate that spliceosomal introns may play a role in the explosion of many eukaryotes at the Big Bang radiation

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications.

Author: A Morgulis
Ana S H Costa
BA Braaten
BF Vanyushin
C Zang
Charles R Bradshaw
Christian Frezza
D Dominissini
D Roberts
E Ford
EL Greer
EM Reyes
EM Zdobnov
F Laengle-Rouault
G Bertani
G Zhang
GE Geier
George E Allen
H Kobayashi
H Li
J Boyes
JB Bowes
John B Gurdon
KS Pollard
L Micallef
M Lawrence
M Lu
Magdalena J Koziol
MG Marinus
MS May
PA Jones
PD Lawley
PD Thomas
PH Kay
RD Hotchkiss
S Ito
SE Luria
T Pfaffeneder
TL Bailey
TW Munns
U Gunthert
WA Cantara
WJ Kent
Y Fu
Y Gruenbaum
Publication venue: Nat Struct Mol Biol
Publication date: 21/12/2015
Field of study

Methylation of cytosine deoxynucleotides generates 5-methylcytosine (m(5)dC), a well-established epigenetic mark. However, in higher eukaryotes much less is known about modifications affecting other deoxynucleotides. Here, we report the detection of N(6)-methyldeoxyadenosine (m(6)dA) in vertebrate DNA, specifically in Xenopus laevis but also in other species including mouse and human. Our methylome analysis reveals that m(6)dA is widely distributed across the eukaryotic genome and is present in different cell types but is commonly depleted from gene exons. Thus, direct DNA modifications might be more widespread than previously thought.M.J.K. was supported by the Long-Term Human Frontiers Fellowship (LT000149/2010-L), the Medical Research Council grant (G1001690), and by the Isaac Newton Trust Fellowship (R G76588). The work was sponsored by the Biotechnology and Biological Sciences Research Council grant BB/M022994/1 (J.B.G. and M.J.K.). The Gurdon laboratory is funded by the grant 101050/Z/13/Z (J.B.G.) from the Wellcome Trust, and is supported by the Gurdon Institute core grants, namely by the Wellcome Trust Core Grant (092096/Z/10/Z) and by the Cancer Research UK Grant (C6946/A14492). C.R.B. and G.E.A. are funded by the Wellcome Trust Core Grant. We are grateful to D. Simpson and R. Jones-Green for preparing X. laevis eggs and oocytes, F. Miller for providing us with M. musculus tissue, T. Dyl for X. laevis eggs and D. rerio samples, and to Gurdon laboratory members for their critical comments. We thank U. Ruether for providing us with M. musculus kidney DNA (Entwicklungs- und Molekularbiologie der Tiere, Heinrich Heine Universitaet Duesseldorf, Germany). We also thank J. Ahringer, S. Jackson, A. Bannister and T. Kouzarides for critical input and advice, M. Sciacovelli and E. Gaude for suggestions.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/nsmb.314

Crossref

PubMed Central

Apollo (Cambridge)